Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible enough to achieve optimal solutions. Meta learning based methods address this issue by learning a data selection function, but can be hard to optimize. In light of these pros and cons, we propose Selection-Enhanced Noisy label Training (SENT) that does not rely on meta learning while having the flexibility of being data-driven. SENT transfers the noise distribution to a clean set and trains a model to distinguish noisy labels from clean ones using model-based features. Empirically, on a wide range of tasks including text classification and speech recognition, SENT improves performance over strong baselines under the settings of self-training and label corruption.
translated by 谷歌翻译
用深度学习方法证明自动定理最近引起了注意。在本文中,我们为三角身份构建了一个自动证明系统。我们定义了三角身份的归一化形式,为证明设计一组规则,并提出了一种可以生成理论上无限三角身份的方法。我们的目标不仅是完成证明,而且要在尽可能少的步骤中完成证明。因此,我们设计了一个模型来学习由随机BFS(RBF)生成的证明数据,并且在理论上和实验上证明了该模型在简单的模仿学习后可以胜过RBF。通过增强学习进一步改进,我们获得了Autotrig,这可以在几乎与BFS(理论上最短的方法)中为身份提供证明步骤,而时间成本仅为千分之一。此外,Autotrig在合成数据集中还击败Sympy,Matlab和人类,并且在许多概括任务中表现良好。
translated by 谷歌翻译
由于数据注释的高成本,半监督行动识别是一个具有挑战性的,但重要的任务是。这个问题的常见方法是用伪标签分配未标记的数据,然后将其作为训练中的额外监督。通常在最近的工作中,通过在标记数据上训练模型来获得伪标签,然后使用模型的自信预测来教授自己。在这项工作中,我们提出了一种更有效的伪标签方案,称为跨模型伪标记(CMPL)。具体地,除了主要骨干内,我们还介绍轻量级辅助网络,并要求他们互相预测伪标签。我们观察到,由于其不同的结构偏差,这两种模型倾向于学习来自同一视频剪辑的互补表示。因此,通过利用跨模型预测作为监督,每个模型都可以受益于其对应物。对不同数据分区协议的实验表明我们对现有替代方案框架的重大改进。例如,CMPL在Kinetics-400和UCF-101上实现了17.6 \%$ 17.6 \%$ 25.1 \%$ 25.使用RGB模态和1 \%$标签数据,优于我们的基线模型,FIXMATCT,以$ 9.0 \% $和10.3美元\%$。
translated by 谷歌翻译
量子噪声是嘈杂中间级量子(NISQ)计算机中的关键挑战。以前的缓解噪声的工作主要集中在门级或脉冲级噪声自适应编译。然而,有限的研究工作通过使量子电路本身对噪声具有更高的优化级别。我们提出了Quoutumnas,是变分电路和量子位映射的噪声自适应共同搜索的全面框架。变形量子电路是构建QML和量子仿真的有希望的方法。然而,由于大型设计空间和参数训练成本,找到最佳变分电路及其最佳参数是具有挑战性的。我们建议通过引入新的超级速度来解耦电路搜索和参数培训。超电路由多层预定的参数化栅极构成,并通过迭代采样和更新其的参数子集(Subcircuit)训练。它提供了从头开始培训的子通差形性能的准确估计。然后我们执行Subcircuit的演进共同搜索和其量子位映射。使用从超级电路继承的参数和使用真实设备噪声模型进行估计,估计子电路性能。最后,我们执行迭代栅极修剪和FineTuning以去除冗余栅极。在10个量子计算上广泛评估了12个QML和VQE基准,Quoutumnas显着优于基线。对于QML,Quoutumnas是第一个展示超过95%的2级,85%的4级和真实QC的32%的10级分类准确性。与UCCSD相比,它还实现了H2,H2O,LIH,CH4,BEH2上的VQE任务的最低特征值。我们还开源Quantumengine(https://github.com/mit-han-lab/pytorch-quantum),用于快速训练参数化量子电路,以促进未来的研究。
translated by 谷歌翻译
Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1 st on the competitive SemanticKITTI leaderboard upon publication. It also achieves 8× computation reduction and 3× measured speedup over MinkowskiNet still with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.
translated by 谷歌翻译
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both timeconsuming and sub-optimal. There are plenty of specialized hardware for neural networks, but little research has been done for specialized neural network optimization for a particular hardware architecture. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95× and the energy consumption by 1.9× with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design. * indicates equal contributions. 68 69 70 71 72 73 25 44 63 82 101 120 MobileNets (fixed 8-bit quantization) MobileNets (our flexible-bit quantization) Latency (ms) Top-1 Accuracy (%) 1MB 2MB 3MB Model Size:Figure 1: We need mixed precision for different layers. We quantize MobileNets [12] to different number of bits (both weights and activations), and it lies on a better pareto curve (yellow) than fixed bit quantization (blue). The reason is that different layers have different redundancy and have different arithmetic intensity (OPs/byte) on the hardware, which advocates for using mixed precision for different layers.
translated by 谷歌翻译
Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections. In this paper, we find 99.9% of the gradient exchange in distributed SGD are redundant, and propose Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth. To preserve accuracy during this compression, DGC employs four methods: momentum correction, local gradient clipping, momentum factor masking, and warm-up training. We have applied Deep Gradient Compression to image classification, speech recognition, and language modeling with multiple datasets including Cifar10, ImageNet, Penn Treebank, and Librispeech Corpus. On these scenarios, Deep Gradient Compression achieves a gradient compression ratio from 270× to 600× without losing accuracy, cutting the gradient size of ResNet-50 from 97MB to 0.35MB, and for DeepSpeech from 488MB to 0.74MB. Deep gradient compression enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributed training on mobile. The code is available at: https://github.com/synxlin/ deep-gradient-compression.
translated by 谷歌翻译
Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects. This work presents DisCoScene: a 3Daware generative model for high-quality and controllable scene synthesis. The key ingredient of our method is a very abstract object-level representation (i.e., 3D bounding boxes without semantic annotation) as the scene layout prior, which is simple to obtain, general to describe various scene contents, and yet informative to disentangle objects and background. Moreover, it serves as an intuitive user control for scene editing. Based on such a prior, the proposed model spatially disentangles the whole scene into object-centric generative radiance fields by learning on only 2D images with the global-local discrimination. Our model obtains the generation fidelity and editing flexibility of individual objects while being able to efficiently compose objects and the background into a complete scene. We demonstrate state-of-the-art performance on many scene datasets, including the challenging Waymo outdoor dataset. Project page: https://snap-research.github.io/discoscene/
translated by 谷歌翻译
Video generation requires synthesizing consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs). First, towards composing adjacent frames, we show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality. Second, by incorporating the temporal shift module (TSM), originally designed for video understanding, into the discriminator, we manage to advance the generator in synthesizing more consistent dynamics. Third, we develop a novel B-Spline based motion representation to ensure temporal smoothness to achieve infinite-length video generation. It can go beyond the frame number used in training. A low-rank temporal modulation is also proposed to alleviate repeating contents for long video generation. We evaluate our approach on various datasets and show substantial improvements over video generation baselines. Code and models will be publicly available at https://genforce.github.io/StyleSV.
translated by 谷歌翻译
Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically an NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
translated by 谷歌翻译